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Abstract. Pseudo-Boolean constraints, also known as 0-1 Integer Lin¬ 
ear Constraints, are used to model many real-world problems. A common 
approach to solve these constraints is to encode them into a SAT for¬ 
mula. The runtime of the SAT solver on such formula is sensitive to 
the manner in which the given pseudo-Boolean constraints are encoded. 
In this paper, we propose generalized Totalizer encoding (GTE), which 
is an arc-consistency preserving extension of the Totalizer encoding to 
pseudo-Boolean constraints. Unlike some other encodings, the number of 
auxiliary variables required for GTE does not depend on the magnitudes 
of the coefficients. Instead, it depends on the number of distinct combina¬ 
tions of these coefficients. We show the superiority of GTE with respect 
to other encodings when large pseudo-Boolean constraints have low num¬ 
ber of distinct coefficients. Our experimental results also show that GTE 
remains competitive even when the pseudo-Boolean constraints do not 
have this characteristic. 


1 Introduction 

Pseudo-Boolean constraints (PBCs) or 0-1 Integer Linear constraints have been 
used to model a plethora of real world problems such as computational biol¬ 
ogy E2G21, upgradeability problems (annum : resource allocation E2, schedul¬ 
ing |26j and automated test pattern generation 122] . Due to its importance and a 
plethora of applications, a lot of research has been done to efficiently solve PBCs. 
One of the popular approaches is to convert PBCs into a SAT formula izunun] 
thus making them amenable to off-the-shelf SAT solvers. We start by formally 
introducing PBC, followed by a discussion on how to convert a PBC into a SAT 
formula. 

A PBC is defined over a finite set of Boolean variables x \,..., x n which can be 
assigned a value 0 (false) or 1 (true). A literal U is either a Boolean variable a 
(positive literal) or its negation -i Xi (negative literal). A positive (resp. negative) 
literal l - L is said to be assigned 1 if and only if the corresponding variable Xi is 
assigned 1 (resp. 0). Without a loss of generality, PBC can be defined as a linear 
inequality of the following normal form: 
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(O : 02 , 03 , 05,06 : 6) 


(A : 02,03, 05 : 5) 


(B : 63, be : 6) 


(C : (1 : 2) (D:h: 3) (E : Z 3 : 3) (F : I4 : 3) 

Fig. 1: Generalized Totalizer Encoding for 2Zi + 3?2 + 3^3 + 3/4 < 5 


y Wjii < fc (i) 

Here, Wi G N + are called coefficients or weights, U are input literals and 
k £ N + is called the bound. Linear inequalities in other forms (e.g. other in¬ 
equality, equalities or negative coefficients) can be converted into this normal 
form in linear time [5]. Cardinality constraint is a special case of PBC when all 
the weights have the value 1. Many different encodings have been proposed to 
encode cardinality constraints mum Linear pseudo-Boolean solving (PBS) 
is a generalization of the SAT formulation where constraints are not restricted to 
clauses and can be PBCs. A related problem to PBS is the linear pseudo-Boolean 
optimization (PBO) problem, where all the constraints must be satisfied and the 
value of a linear cost function is optimized. PBO usually requires an iterative al¬ 
gorithm which solves a PBS in every iteration Illlll8lll9ll2lj . Considering that the 
focus of the paper is on encodings rather than algorithms, we restrict ourselves 
to the decision problem (PBS). 

This paper makes the following contributions. 

— We propose an arc-consistency [12] preserving extension of Totalizer encod¬ 
ing [5] called Generalized Totalizer encoding (GTE) in Section [2] 

— We compare various PBC encoding schemes that were implemented in a 
common framework, thus providing a fair comparison. After discussing re¬ 
lated work in Section [3] we show GTE as a promising encoding through its 
competitive performance in Section 0] 


2 Generalized Totalizer Encoding 

The Totalizer encoding [5] is an encoding to convert cardinality constraints into a 
SAT formula. In this section, the generalized Totalizer encoding (GTE) to encode 
PBC into SAT is presented. GTE can be better visualized as a binary tree, as 
shown in Fig. [TJ With the exception of the leaves, every node is represented as 
(■nodejname : nodejvars : node sum). The nodesum for every node represents 
the maximum possible weighted sum of the subtree rooted at that node. For 
any node A , a node variable a w represents a weighted sum w of the underlying 
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subtree. In other words, whenever the weighted sum of some of the input literals 
in the subtree becomes w, a w must be set to 1. Note that for any node A, we 
would need one variable corresponding to every distinct weighted sum that the 
input literals under A can produce. Input literals are at the leaves, represented 
as ( node-name : literal-name : literal-weight) with each of the terms being self 
explanatory. 

For any node P with children Q and R, to ensure that weighted sum is 
propagated from Q and R to P, the following formula is built for P: 

/ \ 

A q wi £ Q.nodejuars (—’Qwi V ~’V W2 ^ Pws ) 
r W2 £ R.nodejvars 

W3 = Wi + W2 

\ Pw 3 £ P.nodejvars ) 

The left part of Eqn. © ensures that, if node Q has witnessed a weighted 
sum of w\ and R has witnessed a weighted sum of w%, then P must be considered 
to have witnessed the weighted sum of W 3 = w\ + W 2 ■ The right part of Eqn. 0 
just takes care of the boundary condition where weighted sums from Q and R 
are propagated to P without combining it with their siblings. This represents 
that Q (resp. R) has witnessed a weighted sum of w but R (resp. Q) may not 
have witnessed any positive weighted sum. 

Note that node O in Fig. |T] does not have variables for the weighted sums 
larger than 6. Once the weighted sum goes above the threshold of k , we represent 
it with k + 1. Since all the weighted sums above k would result in the constraint 
being not satisfied, it is sound to represent all such sums as k + 1. This is in some 
sense a generalization of ^-simplification described in mm- For ^-simplification, 
u >3 in Eqn. © would change to W 3 = min(w\ + W 2 , k + 1). 

Finally, to enforce that the weighted sum does not exceed the given threshold 
k, we add the following constraint at the root node O : 

_, Ofe+1 (3) 

Encoding properties: Let Aj w represent the multiset of weights of all the 
input literals in the subtree rooted at node A. For any given multiset S of 
weights, let Weight(S) = Ylees e - F° r a given multiset S , let unique(S) denote 
the set with all the multiplicity removed from S. Let |Sj denote the cardinality 
of the set S. Hence, the total number of node variables required at node A is: 


^ I A Sui £ (Q.node-vars U R.nodejvars) V Pw 1 ) I (^) 


w — w 

p w > £ P.nodejvars 


|unique ({ Weight(S)\S C Ai w A S ^ 0})| (4) 

Note that unlike some other encodings mm the number of auxiliary vari¬ 
ables required for GTE does not depend on the magnitudes of the weights. 
Instead, it depends on how many unique weighted sums can be generated. Thus, 
we claim that for pseudo-Boolean constraints where the distinct weighted sum 
combinations are low, GTE should perform better. We corroborate our claim in 
Section U through experiments. 
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Nevertheless, in the worst case, GTE can generate exponentially many auxil¬ 
iary variables and clauses. For example, if the weights of input literals l\,... ,l n 
are respectively 2 °,..., 2 ™ _1 , then every possible weighted sum combination 
would be unique. In this case, GTE would generate exponentially many aux¬ 
iliary variables. Since every variable is used in at least one clause, it will also 
generate exponentially many clauses. 

Though GTE does not depend on the magnitudes of the weights, one can 
use the magnitude of the largest weight to categorize a class of PBCs for which 
GTE is guaranteed to be of polynomial size. If there are n input literals and 
the largest weight is a polynomial P(n), then GTE is guaranteed to produce a 
polynomial size formula. If the largest weight is P(n), then the total number 
of distinct weight combinations (Eqn. Q) is bounded by nP(n), resulting in a 
polynomial size formula. 

The best case for GTE occurs when all of the weights are equal, in which 
case the number of auxiliary variables and clauses is, respectively, 0 (nlog 2 n) 
and 0(n 2 ). Notice that for this best case with ^-simplification, we have 0(nk ) 
variables and clauses, since it will behave exactly as the Totalizer encoding [5]. 

Note also that the generalized arc consistency (GAC) [12] property of Total¬ 
izer encoding holds for GTE as well. GAC is a property of an encoding which 
allows the solver to infer maximal possible information through propagation, 
thus helping the solver to prune the search space earlier. The original proof [5| 
makes an inductive argument using the left subtree and the right subtree of a 
node. It makes use of the fact that, if there are q input variables set to 1 in 
the left child Q and r input variables are set to 1 in the right child R , then 
the encoding ensures that in the parent node P, the variable p q + r is set to 1. 
Similarly, GTE ensures that if the left child Q contributes w\ to the weighted 
sum (q Wl is set to 1) and the right child R contributes W 2 to the weighted sum 
(r W2 is set to 1), then the parent node P registers the weighted sum to be at 
least W 3 = W 2 + W 1 ( p W3 is set to 1). Hence, the GAC proof still holds for GTE. 

3 Related Work 

The idea of encoding a PBC into a SAT formula is not new. One of the first such 
encoding is described in mm which uses binary adder circuit like formulation 
to compute the weighted sum and then compare it against the threshold k. 
This encoding creates 0 {nlog 2 k ) auxiliary clauses, but it is not arc-consistent. 
Another approach to encode PBCs into SAT is to use sorting networks DU- 
This encoding produces 0(N log 2 N ) auxiliary clauses, where N is bounded by 
\log 2 W 1 ] +...+ \log 2 Wn}. This encoding is also not arc-consistent for PBCs, but 
it preserves more implications than the adder encoding, and it maintains GAC 
for cardinality constraints. 

The Watchdog encoding [7] scheme uses the Totalizer encoding, but in a 
completely different manner than GTE. It uses multiple Totalizers, one for each 
bit of the binary representation of the weights. The Watchdog encoding was the 
first polynomial sized encoding that maintains GAC for PBCs and it only gen- 



Generalized Totalizer Encoding for Pseudo-Boolean Constraints 


5 


erates 0 (n 3 log2nlog2W max ) auxiliary clauses. Recently, the Watchdog encoding 
has been generalized to a more abstract framework with the Binary Merger en¬ 
coding [20]. Using a different translation of the components of the Watchdog 
encoding allows the Binary Merger encoding to further reduce the number of 
auxiliary clauses to 0(n 2 /o(;|n log 2 W rnax )- The Binary Merger is also polyno¬ 
mial and maintains GAC. 

Other encodings that maintain GAC can be exponential in the worst case 
scenario, such as BDD based encodings mm- These encodings share quite a 
lot of similarity to GTE, such as GAC and independence from the magnitude of 
the weight. One of the differences is that GTE always has a tree like structure 
amongst auxiliary variables and input literals. However, the crucial difference 
lies in the manner in which auxiliary variables are generated, and what they 
represent. In BDD based approaches, an auxiliary variable Di attempts to reason 
about the weighted sum of the input literals either Zj,..., l n or Zi,..., U- On the 
other hand, an auxiliary variable a w at a node A in GTE attempts to only 
reason about the weighted sum of the input literals that are descendants of A. 
Therefore, two auxiliary variables in two disjoint subtrees in GTE are guaranteed 
to reason about disjoint sets of input literals. We believe that such a localized 
reasoning could be a cause of relatively better performance of GTE as reported in 
Section [4] It is worth noting that the worst case scenario for GTE, when weights 
are of the form a\ where a > 2, would generate a polynomial size formula for 
BDD based approaches mmm- 

As GTE generalizes the Totalizer encoding, the Sequential Weighted Counter 
(SWC) encoding [2] generalizes sequential encoding [251 for PBCs. Like BDD 
based approaches and GTE, SWC can be exponential in the worst case. 

4 Implementation and Evaluation 

All experiments were performed on two AMD 6276 processors (2.3 GHz) run¬ 
ning Fedora 18 with a timeout of 1,800 seconds and a memory limit of 16 
GB. Similar resource limitations were used during the last pseudo-Boolean (PB) 
evaluation of 20150. For a fair comparison, we implemented GTE (gte) in the 
PBLib [29] (version 1.2) open source library which contains a plethora of en¬ 
codings, namely, Adder Networks (adder) [TTI501 . Sorting Networks (sorter) [TTj . 
watchdog (watchdog) [7J, Binary Merger (bin-merger) [2D], Sequential Weighted 
Counter (swc) [2], and BDDs (bdd) [Tj. A new encoding in PBLib can be added 
by implementing encode method of the base class Encoder. Thus, all the encod¬ 
ings mentioned above, including GTE, only differ in how encode is implemented 
while they share the rest of the whole environment. PBLib provides parsing 
and normalization m routines for PBC and uses Minisat 2.2.0 m as a back¬ 
end SAT solver. When the constraint to be encoded into CNF is a cardinality 
constraint, we use the default setting of PBLib that dynamically selects a car¬ 
dinality encoding based on the number of auxiliary clauses. When the constraint 
to be encoded into CNF is a PBC, we specify one of the above encodings. 

3 http://www.cril.univ-artois.fr/PB12/ 
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Table 1: Characteristics of pseudo-Boolean benchmarks 


Benchmark 

#PB 

#lits 

k 

max Wi 

E u ’i 

#diff Wi 

PB’12 

164.31 

32.25 

27.94 

12.55 

167.14 

6.72 

pedigree 

1.00 

10,794.13 

11,106.69 

456.28 

4,665,237.38 

2.00 


Table 2: Number of solved instances 


Benchmark Result 

sorter 

SWC 

adder 

watchdog 

bin-merger 

bdd 

gte 

PB’12 

SAT 

72 

74 

73 

79 

79 

81 

81 

(214) 

UNSAT 

74 

77 

83 

85 

85 

84 

84 

pedigree 

SAT 

2 

7 

6 

25 

43 

82 

83 

(172) 

UNSAT 

0 

7 

6 

23 

35 

72 

75 

Total 

SAT/UNSAT 

146 

165 

172 

212 

242 

319 

323 


Benchmarks: Out of all 355 instances from the DEC-SMALLINT-LIN cate¬ 
gory in the last PB evaluation of 2012 (PB’12), we only considered those 214 
instance^ that contain at least 1 PBC. We also consider an additional set of 
pedigree benchmarks from computational biology m- These benchmarks were 
originally encoded in Maximum Satisfiability (MaxSAT) and were used in the 
last MaxSAT Evaluation of 20l4E Any MaxSAT problem can be converted to a 
corresponding equivalent pseudo-Boolean problem J2]. We generate two pseudo- 
Boolean decision problems (one satisfiable, another unsatisfiable) from the op¬ 
timization version of each of these benchmarks. The optimization function is 
transformed into a PBC with the value of the bound k set to a specific value. 
Let the optimum value for the optimization function be k opt . The satisfiable de¬ 
cision problem uses k op t as the value for the bound fc, whereas the unsatisfiable 
decision problem uses k opt — 1 as the value for the bound k. Out of 200 generated 
instances^, 172 had at least 1 PBC and were selected for further evaluation. 

Tab. □ shows the characteristics of the benchmarks used in this evaluation. 
#PB denotes the average number of PBCs per instance. #lits, k , max Wj, wy 
and #diff Wi denote the per constraint per instance average of input literals, 
bound, the largest weight, maximum possible weighted sum and the number 
of distinct weights. PB’12 benchmarks are a mix of crafted as well as industrial 
benchmarks, whereas all of the pedigree benchmarks are from the same biological 
problem [15] . The PB’12 benchmarks have on average several PBCs, however, 
they are relatively small in magnitude. In contrast, the pedigree benchmarks 
contain one large PB constraint with very large total weighted sum. pedigree 
benchmarks have only two distinct values of weights, thus making them good 
candidates for using GTE. 


Results: Tab.[2]shows the number of instances solved using different encodings, 
sorter, adder and swc perform worse than the remaining encodings for both sets of 

4 Available at http://sat.inesc-id.pt/~ruben/benchmarks/pbl2-subset.zip 

5 http://www.maxsat.udl.cat/14/ 

6 Available at http://sat.inesc-id.pt/~ruben/benchmarks/pedigrees.zip 
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l) # Variables on PB’12 benchmarks (b) # Variables on pedigree benchmarks 
■ 10 7 



a 



(c) # Clauses on PB’12 benchmarks (d) # Clauses on pedigree benchmarks 



(e) Runtime on PB’12 benchmarks 



instances 


(f) Runtime on pedigree benchmarks 


Fig. 2: Cactus plots of number of variables, number of clauses and runtimes 


















































Saurabh Joshi, Ruben Martins, and Vasco Manquinho 


benchmarks. The first two are not arc-consistent therefore the SAT solver is not 
able to infer as much information as with arc-consistent encodings, swc, though 
arc-consistent, generates a large number of auxiliary variables and clauses, which 
deteriorates the performance of the SAT solver. 

gte provides a competitive performance to bdd, bin-merger and watchdog for 
PB’12. However, only the gte and bdd encodings are able to tackle pedigree 
benchmarks, which contain a large number of literals and only two different 
coefficients. Unlike other encodings, gte and bdd are able to exploit the charac¬ 
teristics of these benchmarks. 

swc requires significantly large number of variables as the value of k increases, 
whereas bdd and gte keep the variable explosion in check due to reuse of variables 
on similar combinations (Figs. [5a] and I2bl) . This reuse of auxiliary variables 
is even more evident on pedigree benchmarks (Fig. I2bl) as these benchmarks 
have only two different coefficients resulting in low number of combinations, k- 
simplification also helps gte in keeping the number of variables low as all the 
combinations weighing more than k + 1 are mapped to k + 1. 

Number of clauses required for gte is quite large as compared to some other 
encodings (Figs. [5c] and l2dl). gte requires clauses to be generated for all the 
combinations even though most of them produce the same value for the weighted 
sum, thus reusing the same variable. Though bdd has an exponential worst case, 
in practice it appears to generate smaller formulas 1 Figs. l2cl and l2dl) . 

Fig. [5e] shows that gte provides a competitive performance with respect to 
bin-merger, watchdog and bdd. Runtime on pedigree benchmarks as shown in 
Fig. l2fl establishes gte as the clear winner with bdd performing a close second. 
The properties that gte and bdd share help them perform better on pedigree 
benchmarks as they are not affected by large magnitude of weights in the PBCs. 


5 Conclusion 

Many real-world problems can be formulated using pseudo-Boolean constraints 
(PBC). Given the advances in SAT technology, it becomes crucial how to encode 
PBC into SAT, such that SAT solvers can efficiently solve the resulting formula. 

In this paper, an arc-consistency preserving generalization of the Totalizer 
encoding is proposed for encoding PBC into SAT. Although the proposed en¬ 
coding is exponential in the worst case, the new Generalized Totalizer encoding 
(GTE) is very competitive in relation with other PBC encodings. Moreover, ex¬ 
perimental results show that when the number of different weights in PBC is 
small, it clearly outperforms all other encodings. As a result, we believe the im¬ 
pact of GTE can be extensive, since one can further extend it into incremental 
settings [55] , 
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